Skip to content

Conversation

cjluo-nv
Copy link
Collaborator

@cjluo-nv cjluo-nv commented Sep 17, 2025

What does this PR do?

Revert removal of int8_sq support for vlm_ptq example.

Overview: ?

Just that the int8_sq for vlm_ptq will still generate a TensorRT-LLM checkpoint and language only. User needs to use the TensorRT-LLM's TRT backend + separate imaging network building using TRT for inference. The torch runtime is not available.

Summary by CodeRabbit

  • New Features

    • Added support for the int8_sq quantization format in the example quantization workflow.
    • Introduced flags to configure inference tensor and pipeline parallelism during quantization.
    • Improved error messaging to clearly list accepted quantization options.
  • Documentation

    • Updated the changelog by removing the deprecation note for the int8_sq quantization format in version 0.37.

@cjluo-nv cjluo-nv requested a review from a team as a code owner September 17, 2025 22:51
@cjluo-nv cjluo-nv requested review from meenchen and kevalmorabia97 and removed request for meenchen September 17, 2025 22:51
Copy link

coderabbitai bot commented Sep 17, 2025

Walkthrough

Removed a deprecation note for int8_sq in CHANGELOG. Updated the VLM PTQ Hugging Face example script to accept int8_sq, improved the unknown-arg error message, and added inference tensor/pipeline parallel flags when invoking the quantization script.

Changes

Cohort / File(s) Change Summary
Changelog edit
CHANGELOG.rst
Removed the 0.37 deprecation bullet for int8_sq; no other content changes.
VLM PTQ script update
examples/vlm_ptq/scripts/huggingface_example.sh
Added int8_sq to accepted QFORMAT values; updated error message to include it; passed --inference_tensor_parallel=$TP and --inference_pipeline_parallel=$PP to hf_ptq.py.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor U as User
    participant SH as huggingface_example.sh
    participant PTQ as hf_ptq.py
    participant BE as Inference Backend

    U->>SH: Run with QFORMAT and TP/PP
    alt QFORMAT recognized (e.g., int8, int8_sq, w4a16, ...)
        note right of SH: Includes new int8_sq option
        SH->>PTQ: Invoke with --qformat QFORMAT<br/>--inference_tensor_parallel=$TP<br/>--inference_pipeline_parallel=$PP
        PTQ->>BE: Configure model + quantization
        BE-->>PTQ: Ready
        PTQ-->>SH: Quantization artifacts
        SH-->>U: Success
    else Unknown QFORMAT
        SH-->>U: Error listing accepted values (incl. int8_sq)
    end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10–15 minutes

Poem

A hop, a bop, I tweak the log—
int8_sq returns, no epilogue.
The script now threads in tidy pairs,
TP, PP—parallel hares.
With flags aligned and formats true,
I thump the ground: “We’re good to queue!” 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "Reinstate int8_sq support for vlm_example." is a concise, single sentence that directly describes the primary change—restoring int8_sq support for the VLM example. It aligns with the PR objectives and the file diffs (changes to examples/vlm_ptq and the CHANGELOG), so a reviewer scanning history can quickly understand the main intent.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch chenjiel/reinstate_int8

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cjluo-nv cjluo-nv requested a review from Edwardf0t1 September 17, 2025 22:51
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
examples/vlm_ptq/scripts/huggingface_example.sh (2)

92-100: Guard TP/PP flags to avoid passing empty values.

If TP/PP are unset, argparse may treat “= ” as invalid. Default to 1 or append only if set.

Apply one of these:

Option A (defaults):

-            --inference_tensor_parallel=$TP \
-            --inference_pipeline_parallel=$PP \
+            --inference_tensor_parallel=${TP:-1} \
+            --inference_pipeline_parallel=${PP:-1} \

Option B (conditional append earlier, keeping this block unchanged):

 PTQ_ARGS=""
+if [ -n "$TP" ]; then PTQ_ARGS+=" --inference_tensor_parallel $TP "; fi
+if [ -n "$PP" ]; then PTQ_ARGS+=" --inference_pipeline_parallel $PP "; fi

106-118: nvfp4 GPU check is unreachable due to the earlier non‑fp8 early‑exit.

The block at Lines 111–118 never executes when QFORMAT=nvfp4. Fold it into the non‑fp8 branch so users still get the Blackwell warning.

Apply:

-if [[ "$QFORMAT" != "fp8" ]]; then
-    echo "For quant format $QFORMAT, please refer to the TensorRT-LLM documentation for deployment. Checkpoint saved to $SAVE_PATH."
-    exit 0
-fi
-
-if [[ "$QFORMAT" == *"nvfp4"* ]] || [[ "$KV_CACHE_QUANT" == *"nvfp4"* ]]; then
-    cuda_major=$(nvidia-smi --query-gpu=compute_cap --format=csv,noheader -i 0 | cut -d. -f1)
-
-    if [ "$cuda_major" -lt 10 ]; then
-        echo "Please deploy the NVFP4 checkpoint on a Blackwell GPU. Checkpoint export_path: $SAVE_PATH"
-        exit 0
-    fi
-fi
+if [[ "$QFORMAT" != "fp8" ]]; then
+    if [[ "$QFORMAT" == *"nvfp4"* ]] || [[ "$KV_CACHE_QUANT" == *"nvfp4"* ]]; then
+        cuda_major=$(nvidia-smi --query-gpu=compute_cap --format=csv,noheader -i 0 | cut -d. -f1)
+        if [ "$cuda_major" -lt 10 ]; then
+            echo "Please deploy the NVFP4 checkpoint on a Blackwell GPU. Checkpoint export_path: $SAVE_PATH"
+        fi
+    fi
+    echo "For quant format $QFORMAT, please refer to the TensorRT-LLM documentation for deployment. Checkpoint saved to $SAVE_PATH."
+    exit 0
+fi
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 146d1d9 and 4bed5de.

📒 Files selected for processing (2)
  • CHANGELOG.rst (0 hunks)
  • examples/vlm_ptq/scripts/huggingface_example.sh (2 hunks)
💤 Files with no reviewable changes (1)
  • CHANGELOG.rst
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: linux
  • GitHub Check: code-quality
  • GitHub Check: build-docs

Comment on lines 37 to 42
case $QFORMAT in
fp8|int4_awq|w4a8_awq|nvfp4)
fp8|int8_sq|int4_awq|w4a8_awq|nvfp4)
;;
*)
echo "Unknown quant argument: Expected one of: [fp8, int4_awq, w4a8_awq, nvfp4]" >&2
echo "Unknown quant argument: Expected one of: [fp8, int8_sq, int4_awq, w4a8_awq, nvfp4]" >&2
exit 1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

🧩 Analysis chain

Re‑enablement of int8_sq looks good.

Case arm and error message now include int8_sq; matches the PR goal.

Run to confirm downstream accepts this qformat:


🏁 Script executed:

#!/bin/bash
# Verify hf_ptq.py accepts int8_sq for --qformat and exposes the new inference flags
fd -t f hf_ptq.py
rg -nP -C2 -- 'add_argument\((["'\''])--qformat\1.*choices=.*int8_sq' $(fd -t f hf_ptq.py)
rg -nP -C2 -- 'add_argument\((["'\''])--inference_tensor_parallel\1' $(fd -t f hf_ptq.py)
rg -nP -C2 -- 'add_argument\((["'\''])--inference_pipeline_parallel\1' $(fd -t f hf_ptq.py)

# Check parser provides TP/PP defaults
fd -t f parser.sh
rg -n -C2 '\b(TP|PP)=' $(fd -t f parser.sh)

Length of output: 1012


Add int8_sq to --qformat choices in examples/llm_ptq/hf_ptq.py — hf_ptq.py's argparse for --qformat does not include int8_sq (no match found); --inference_tensor_parallel / --inference_pipeline_parallel are present and examples/llm_ptq/scripts/parser.sh sets TP=1 and PP=1.

🤖 Prompt for AI Agents
In examples/vlm_ptq/scripts/huggingface_example.sh around lines 37-42 the script
accepts "int8_sq" in the case switch but examples/llm_ptq/hf_ptq.py's argparse
for --qformat does not include "int8_sq", causing a mismatch; update the
argparse definition in examples/llm_ptq/hf_ptq.py to include "int8_sq" among the
allowed choices for --qformat (keeping other existing options intact), and
ensure help text reflects this addition so the CLI and example script are
consistent with TP/PP handling from parser.sh.

Copy link

codecov bot commented Sep 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.82%. Comparing base (146d1d9) to head (4bed5de).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #333      +/-   ##
==========================================
- Coverage   73.82%   73.82%   -0.01%     
==========================================
  Files         172      172              
  Lines       17438    17438              
==========================================
- Hits        12874    12873       -1     
- Misses       4564     4565       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kevalmorabia97 kevalmorabia97 merged commit 912f3dc into main Sep 18, 2025
22 checks passed
@kevalmorabia97 kevalmorabia97 deleted the chenjiel/reinstate_int8 branch September 18, 2025 05:13
kevalmorabia97 pushed a commit that referenced this pull request Sep 18, 2025
yeyu-nvidia pushed a commit that referenced this pull request Sep 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants